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Abstract 

We provide the first application of spectral cluster¬ 
ing to detecting colluding groups in a stock market. 
For ensuring market efficiency, stock market regula¬ 
tors take strict action against colluding groups who 
manipulate the price of stocks and create artificial 
liquidity. In this work, we show how to use existing 
machine learning techniques for algorithmically de¬ 
tecting suspected colluding groups based on the stock 
market data. A key contribution of this work is in ap¬ 
propriately defining ‘closeness’ between two traders 
that takes into account various factors like common 
traders and frequency, volume and price of a stock 
traded between them. In an earlier work, Apte and 
Palshikar (2008) have applied non-spectral clustering 
technique to the problem where they use the volume 
of the trades as a measure of closeness. Our expe¬ 
rience with real data suggests that including other 
factors in measuring closeness helps detect colluding 
groups that will otherwise go undetected. We demon¬ 
strate the effectiveness of our algorithm by detecting 
clusters in a random Erdos-Renyi graph with hidden 
clusters. 

Keywords: Clustering, Collusion, Stock markets 
regulation. 

1 Introduction 

1.1 Trading in a stock exchange A stock 
exchange provides a platform for people to trade 
stocks of companies that are listed in the exchange. 
Suppose a potential buyer X intents to buy a stock 
S. So X offers or bids a price for one unit of stock of 
S. A potential seller Y, who intents to sell S', offers 
or asks a price for one unit of stock of S. Such offers 
by potential buyers or sellers are called orders. If the 
bidding price is greater than or equal to the asking 
price, then a trade takes place. This means that Y 
transfers a certain units of S (say, z) to A, and X 
pays the total money for the z units to Y as per the 
matched price. The quantity z is called the volume 
of the trade. 
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The stock exchange does not reveal the identity 
of a buyer or a seller. Normally, there are many 
buyers and sellers for a stock. For any stock S, such 
numbers vary throughout the day. In most markets 
the incoming buy or sell order is either matched 
with the existing order or placed in a priority queue, 
where priorities are based on the price. For illiquid 
stocks, it is possible that a buyer a one seller plan 
together and place orders on a predetermined price 
and quantity so that bidding and asking prices match 
exactly. Such trades are called synchronized trades. 

In 2011, Securities Exchange Board of India 
(SEBI), the stock market regulatory of India, initi¬ 
ated regulatory actions against certain individuals 
[1]. As per the report [1], these individuals were 
suspected to be involved in creating substantial 
volumes, which appear to be artificial in nature, 
executing synchronized and structured trades. This 
group of individuals was also found to be increasing 
or maintaining prices and providing misleading 
signals to the market by artificially injecting volumes 
in certain stocks and also contributing to the price 
movement. Further, SEBI observed that such trades 
appear to be taking place in an unbridled manner. 
These traders also trade with other non-colluding 
persons. 

In this paper, we present an algorithm for this 
problem of identifying such groups of individuals in 
an efficient manner. Throughout the paper, such 
groups are referred to as collusion/colluding groups. 

1.2 Problem formulation Trading in a stock 
exchange can be represented as a simple undirected 
weighted graph G = {V,E), where each vertex of V 
represents a trader in a stock exchange, and there is 
an edge between two vertices Vi and Vj of V if (i) there 
is a trade between Vi and vj or (ii) there is a trader 
who has traded with both Vi and Vj. Every edge 
(vi, Vj) of G is assigned a weight Wij. The parameters 
such as price movement, number of trades, total 
volume, commonality between every pair (i,j) are 
used to compute the weights. Since a collusion group 
is expected to be closely connected through trades, 
the corresponding subsets of vertices of G are called 



clusters in G. The problem of identifying collusion 
groups in a stock exchange reduces to the problem of 
identifying clusters in G. 


considering them separately. So, we get a equation 
called MinMaxCut, which was introduced by Ding et. 
al [2]. 


1.3 Our approach For detecting collusion 
groups, graph clustering methods have been used 
earlier by Palshikar and Apte [7], and Islam et. al [5]. 
Their algorithms use total volume to compute the 
weights between two traders, and these algorithms 
have been tested on small simulated data. For the 
first time, we apply spectral clustering technique to 
the problem. Spectral clustering has been success¬ 
fully used to find communities in graphs by White 
and Smyth m- Moreover, for defining closeness 
between two vertices of the graph, we use a function 
to assign weights on edges, where the function is 
dehned in terms of volumes, number of transactions 
between two individuals, price movements, and 
commonality between traders. Furthermore, we have 
used objective metric Q for choosing the number of 
colluding groups proposed by Newman and Girvan 
[5]. Our algorithm is easy to implement, and it 
is tested on actual data of SEBI, showing a good 
performance in practice. Note that our graph is very 
large compared to the graphs used in the earlier 
works. 

In the next section, we present a spectral cluster¬ 
ing technique used in this paper for locating collusion 
groups. In Section 3, we experiment on the data and 
present the results. 




The choice of A and B for which 
MinMaxGut[A, B) achieves its minimum can 
be considered as two clusters. This is one method 
of identifying clusters which we use in this paper. 
There are other methods for computing clusters 
such as RatioCut [?], NormalizedCut [S] etc. For 
our problem of identifying collusion groups, traders 
in the same cluster have more transaction with 
each other unlike the traders between the different 
clusters. The Eq.(2.1) captures these properties than 
other methods. The Eq.(2.1) can be generalized to 
Eq.(2.2) for k clusters as follows [S]. 


( 2 . 2 ) 


MinMaxGut{Ai, A 2 ,..., = 
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The choice of Ai, A 2 ,..., which minimizes 
Eq.(2.2), gives k different clusters. However, the 
problem of finding such k subsets of vertices is NP- 
hard [10]. Ding et. al |2] showed that the problem 
in Eq. |2.2| can be formulated as trace minimization 
problem with relaxation on the constrains as follows. 


2 Spectral Clustering 

Spectral clustering is one of the well known modern 
clustering techniques, used for separating out big 
data in groups based on closeness. Let W represent 
a weighted adjacency matrix of a weighted graph 
G = {V,E) as defined earlier. Let A and B be two 
disjoint subsets of V. Let W{A) and W{B) denote 
the sum of weights of edges of graph induced by A 
and B respectively. Let W{A,B) denote the sum 
of weights of edges between A and B. It is easy to 
see that if A and B are the two different collusion 
groups of G, then W{A,B) should have very low 
value, whereas both W{A) and W{B) should have 
high values. Intuitively, W{A) measures closeness 
amongst the vertices of A. So, it is natural to look 
for subsets A and B such that W{A) and W{B) are 
maximized and W{A,B) is minimized. Eormally, we 
wish to locate two such subsets A and B, such that 
and together have low values. 

Suppose, we add and instead of 


(2.3) min Trace{H LH) subject to H DH = I 

Here D is a diagonal matrix such that da is sum 
of the weights of edges of vertex vi and L — I — 
D~^WD ~2 . The solution of the above problem can 
be obtained as a solution of generalized eigenvalue 
problem. In this case, the solution H is to find the 
first k eigenvectors of L i.e. eigenvector associated 
with the first k smallest eigenvalues of L as the 
columns of H. For converting a real value solution, 
fc—means algorithm can be used on the rows of H to 
obtain discrete k clusters [H]. 

2.1 Number of clusters To find the number of 
clusters in G we use the modularity function Q 
proposed by Newman and Girvan |5]. It is defined 
as follows. 

(2.4) 

Qk = 2^ [-^ - — - - —) 
















The optimal number of clusters k can be achieved 
by finding the value of k for which Qk is maximized. 

3 Our Algorithm 

3.1 Computing edge weights Let us first 
understand the common features of colluding groups. 
Assume that two traders X and Y belong to a 
collusion group. It has been observed that X and Y 
generally trades several times between them on the 
same stock S within a reasonable period of time d. 
Furthermore, during these trades, they tend to trade 
a large quantity of S. Sometimes, they even trade at 
a very high or low price from the last trades, if the 
purpose is to manipulate the price of the stock. In 
addition, X and Y may even trade through a set of 
intermediate traders. 

Let Tij be the total number of trades of S 
between traders corresponding to Vi and Vj during 
d. Observe that Tij can be zero if the corresponding 
traders have not traded S during d. Let T^ax (or 
Tmin) denote the maximum (respectively, minimum) 
value of Tij for all pair So, T^i^ < Tij < 

Tmax- Since Tij is expected to be close to Tmax 
for two traders in a collusion group, the value of 
the ratio assess their 

closeness. Analogously, 

are computed to assess their closeness using volumes 
and prices respectively. Note that there can be 
multiple prices for the multiple trades between the 
two traders. So, Pij is volume weighted average price 
between Vi and Vj. Let Ni and Nj be the set of 
neighbors of Vi and Vj respectively. To incorporate 
the intermediate traders in the Wij^ the common 
neighbors Ni n Nj are expected to be very close to 
the total number of neighbors Ni U Nj. Note that Ni 
contains Vi and Nj contains Vj. Hence, we have the 
following formula for computing Wij. 

1 f Ymin , ^min 

Wi 1 = - — - -- - - 

A \ T —T- V —V- 

^ \ max min ^ max ^ min 

/„ p., , Pi,j ~ P-min F Aj| \ 

’ P —P lA- U N-\ ) ' 

Observe that the value in the above equa¬ 

tion is one if Vi and Vj share all their neighbors. Even 
if traders corresponds to Vi and Vj are not trading di¬ 
rectly but they trade through the same set of traders, 
the edge between them receives non-zero weights. 

3.2 Computing Clusters Now, we present the 
main steps of the algorithm for locating k clusters 
in a graph G. 


Algorithm 1 Spectral Graph Clustering 
Input: G and k. 

For 2 < k < n. Construct W from G. 

Compute D and L. 

Compute the first k eigenvectors of L and construct 
matrix Q G by placing k Eigenvectors as 

columns of Q. 

Construct matrix U from Q by assigning Uij = 

Qi.J 

For i = l,...,n, let G R^ be the vector corre¬ 
sponding to the row of U. 

Cluster the points {yi)i=i,...,n with the A:-means algo¬ 
rithm into clusters Ci,..., C^. 

Compute Qk- 

Pick the corresponding partition which maximize Qk ■ 
Output: Clusters Ai, A 2 , ■ ■ ■, Ak with Ai G {vj | yj G 
CJ. 


4 Experiment and Results 

4.1 Market data Market data consist of all 
trades of every stock for the entire period in the 
two main exchanges in India, namely. National 
Stock Exchange and Bombay Stock Exchange. The 
total number of trades for a period of one year 
are more than a billion for all the stocks. Each 
trade data contains all information or parameters 
consisting of (i) codes of the two traders, (ii) date 
and time of the trade, (iii) stock name (iv) traded 
price of the stock, and (v) traded volume of the stock. 

Consider a situation where two traders X and 
Y have same address or same telephone number 
or ojj-market transactions or any other common 
parameter. Off-market trades are those trades where 
stocks are transfered from one account to another 
account directly without exchange. These trades 
indicates that two individuals know each other and 
are trading knowingly. These informations can 
be used by the regulators to verify the validity of 
colluding group. 

Using Algorithm [T] we analyzed trade data for 
the period of 17 months. We observed that some 
stocks had colluding groups. In many such stocks, 
there is usually only one colluding group per stock, 
i.e., k = 1. After Algorithm identified a cluster C 
for a company, the regulators of the stock markets 
verified whether G was indeed a colluding group, 
using the parameters of traders in G mentioned 
above. The verification showed that G included most 
members of the colluding groups. Since the details 
of the results are classified, here we use publicly 
available data and simulated data to demonstrate the 















performance of our algorithm. For the simulated data 
we assumed that weights follows uniform distribution 
and the parameters of actual data is considered for 
the simulated data. 

4.2 Simulated data We construct a random 
graph G{V,E) which is used as an input to Algo¬ 
rithmic Let G{n,p) be a random graph of size n 
such that there is an edge between any two vertices 
of the graph with probability p [3]. Initialize G{V, E) 
by G{n,p). Choose any two subsets Gi C V and 
G 2 C V oi size ni and 712 respectively. Add edges in 
Cl and similarly in C 2 such that graph induced by 
Cl is G{ni,pi) and C 2 is G(n 2 ,P 2 ) with pi,p 2 > P- 
The weight matrix IF of C is constructed such that 
Wij follows uniform distribution i.e. Wij ~ C(0,1) if 
i,j € Cl or i,j e C 2 else Wij ~ U{0,b) where b < ^. 
Then G is used as an input to the Algorithm [C The 
experiments are repeated for many times for various 
values of n,ni,n 2 ,p,Pi,P 2 ,b. The clusters A, B and 
C identified by the algorithm are compared with Ci 
and C 2 , and the results are shown below. 



Figure 2: The adjacency matrix is obtained after 
running the algorithm and shifting rows and columns 
using orders of second eigenvector of L. In this matrix 
two colluding groups are clearly visible. 


Figure [C is a pictorial image of adjacency matrix 
of G{V, E) with n = 335, p = .l,pi = .7,p2 = -7,711 = 
50, 772 = 60 and b = .4. The ordering of second 
eigenvector of L is used to the pictorial image of 
reordered adjacency matrix is presented in Figure 
Figure]^ is the plot of eigenvalues of L. 









Figure 1: Adjacency matrix of G{V^E) for n = 335. 


index 


Figure 3: The eigenvalues of L for ti = 335. The 
three isolated dots on left indicates three clusters in 
the graph [3j. 


4.3 Publicly available data We have used the 
stock trade data of Bombay Stock Exchange for the 
period of 2011. These data contains information such 
as traders order ids, volume, price and time of the 
trade. The order id is a 17 digit number and we 
assume that the first 6 digits corresponds to the id 




of an individual for our experiments. In Figure 
we show the Q values for various values of k for a 
stock. We have also compared the results obtained 
by this algorithm in case only one of the parameter 
is used i.e. number of transactions, volume, price or 
commonality. 



k 


Figure 4: In this figure the Q value is computed using 
our algorithm. The value Qt corresponds to the Q 
value in equation |2 .4| when the number of transactions 
is the only parameter to compute weights in the graph 
i.e. only first term in equation |3.5| is considered 
to compute the weights. Similarity Qv^ Qp and 
Qc corresponds to volume, price and commonality 
i.e. second, third and forth terms in equation |3.5| 
respectively. The total number of clusters in this 
example is 2. 


4.4 Financial Implications of price manip¬ 
ulation Price manipulation in stock market may 
impact the other financial institutions. For example, 
many banks provide loan against stocks and the 
amount of loan depends on the current price of the 
stock. If the price of a stock is manipulated to make 
it higher, then the sanctioned loan amount from a 
bank can be increased. In case of the default of the 
loan, the loan becomes non performing asset to the 
bank since it is difficult for the bank to sell the stock 
and recover full amount of the loan. 

There is another reason which may motivate 
traders to manipulate the price of a stock. In many 


economy, a short-term capital losses can be set off 
against long/short term capital gains to compute the 
taxable income. To understand this, let us consider 
two investors A and B. Assume that A has some long 
term taxable income (say, y) from some business and 
B has incurred short term loss of y. Suppose A buys 
some stock S from B at a very highly manipulated 
price. After the manipulation is stopped, the price of 
S reduces and S is sold back to i? by A at lower price. 
This brings a short term loss, say x, in the account 
of A and a short term gain of x in the account of B. 
This way A can save his taxes through losses of B 
hy y — X and B still does not have to pay any tax. 
So the tax, which otherwise could have gone to the 
government, gets converted into black money. 

5 Conclusion 

Detecting colluding group is a challenge for the reg¬ 
ulators of the securities markets. So, an automated 
surveillance system which detects the suspect group 
of traders involved in colluding is an important 
problem. In this work we have presented an algo¬ 
rithm which detects such groups. Simulated data is 
constructed here in such a way that it resembles the 
actual data. Naturally, our Algorithm also perform 
well on the simulated data. Hence, our algorithm is 
very practical for identifying collusion groups. 

Spectral clustering can be used for other finance 
problems as well. For example, finding the clusters of 
the stocks which are similar. This can be used to di¬ 
versify a portfolio. It can also be used to classify the 
mutual funds into various categories. One technique 
to formulate the problem is to use weights between 
two mutual funds as Jaccard similarity coefficient, 
where a mutual fund can be considered a set of stocks. 
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